5 research outputs found
The Case for Non-Volatile RAM in Cloud HPCaaS
HPC as a service (HPCaaS) is a new way to expose HPC resources via cloud
services. However, continued effort to port large-scale tightly coupled
applications with high interprocessor communication to multiple (and many)
nodes synchronously, as in on-premise supercomputers, is still far from
satisfactory due to network latencies. As a consequence, in said cases, HPCaaS
is recommended to be used with one or few instances. In this paper we take the
claim that new piece of memory hardware, namely Non-Volatile RAM (NVRAM), can
allow such computations to scale up to an order of magnitude with marginalized
penalty in comparison to RAM. Moreover, we suggest that the introduction of
NVRAM to HPCaaS can be cost-effective to the users and the suppliers in
numerous forms.Comment: 4 page
Portability and Scalability of OpenMP Offloading on State-of-the-art Accelerators
Over the last decade, most of the increase in computing power has been gained
by advances in accelerated many-core architectures, mainly in the form of
GPGPUs. While accelerators achieve phenomenal performances in various computing
tasks, their utilization requires code adaptations and transformations. Thus,
OpenMP, the most common standard for multi-threading in scientific computing
applications, introduced offloading capabilities between host (CPUs) and
accelerators since v4.0, with increasing support in the successive v4.5, v5.0,
v5.1, and the latest v5.2 versions. Recently, two state-of-the-art GPUs - the
Intel Ponte Vecchio Max 1100 and the NVIDIA A100 GPUs - were released to the
market, with the oneAPI and GNU LLVM-backed compilation for offloading,
correspondingly. In this work, we present early performance results of OpenMP
offloading capabilities to these devices while specifically analyzing the
potability of advanced directives (using SOLLVE's OMPVV test suite) and the
scalability of the hardware in representative scientific mini-app (the LULESH
benchmark). Our results show that the vast majority of the offloading
directives in v4.5 and 5.0 are supported in the latest oneAPI and GNU
compilers; however, the support in v5.1 and v5.2 is still lacking. From the
performance perspective, we found that PVC is up to 37% better than the A100 on
the LULESH benchmark, presenting better performance in computing and data
movements.Comment: 13 page
CXL Memory as Persistent Memory for Disaggregated HPC: A Practical Approach
In the landscape of High-Performance Computing (HPC), the quest for efficient
and scalable memory solutions remains paramount. The advent of Compute Express
Link (CXL) introduces a promising avenue with its potential to function as a
Persistent Memory (PMem) solution in the context of disaggregated HPC systems.
This paper presents a comprehensive exploration of CXL memory's viability as a
candidate for PMem, supported by physical experiments conducted on cutting-edge
multi-NUMA nodes equipped with CXL-attached memory prototypes. Our study not
only benchmarks the performance of CXL memory but also illustrates the seamless
transition from traditional PMem programming models to CXL, reinforcing its
practicality.
To substantiate our claims, we establish a tangible CXL prototype using an
FPGA card embodying CXL 1.1/2.0 compliant endpoint designs (Intel FPGA CXL IP).
Performance evaluations, executed through the STREAM and STREAM-PMem
benchmarks, showcase CXL memory's ability to mirror PMem characteristics in
App-Direct and Memory Mode while achieving impressive bandwidth metrics with
Intel 4th generation Xeon (Sapphire Rapids) processors.
The results elucidate the feasibility of CXL memory as a persistent memory
solution, outperforming previously established benchmarks. In contrast to
published DCPMM results, our CXL-DDR4 memory module offers comparable bandwidth
to local DDR4 memory configurations, albeit with a moderate decrease in
performance. The modified STREAM-PMem application underscores the ease of
transitioning programming models from PMem to CXL, thus underscoring the
practicality of adopting CXL memory.Comment: 12 pages, 9 figure